Detección de Opinion Spam usando PU-Learning
نویسنده
چکیده
The detection of false or true opinions about a product or service has become nowadays a very important problem. Recent studies show that up to 80 % of people have changed their final decision on the basis of opinions checked on the web. Some of these opinions may be false, positive in order to promote a product/service or negative to discredit it. To help solving this problem in this thesis is proposed a new method for detection of false opinions, called PU-Learning*, which increases the precision by an iterative algorithm. It also solves the problem of lack of labeled opinions. To operate the method proposed only a small set of opinions labeled as positive and another large set of opinions unlabeled are needed. From this last set, missing negative opinions are extracted and used to achieve a two classes binary classification. This scenario has become a very common situation in the available corpora. As a second contribution, we propose a representation based on n-grams of characters. This representation has the advantage of capturing both the content and the writing style, allowing for improving the effectiveness of the proposed method for the detection of false opinions. The experimental evaluation of the method was carried out by conducting three experiments classification of opinions, using two different collections. The results obtained in each experiment allow seeing the effectiveness of proposed method as well as differences between the use of several types of attributes. Because the veracity or falsity of the reviews expressed by users becomes a very important parameter in decision making, the method presented here, can be used in any corpus where you have the above characteristics.
منابع مشابه
Using PU-Learning to Detect Deceptive Opinion Spam
Nowadays a large number of opinion reviews are posted on the Web. Such reviews are a very important source of information for customers and companies. The former rely more than ever on online reviews to make their purchase decisions and the latter to respond promptly to their clients’ expectations. Due to the economic importance of these reviews there is a growing trend to incorporate spam on s...
متن کاملDetección automática de spam utilizando regresión logística bayesiana
This paper presents an Spam automatic detection system using Bayesian Logistic Regression (BBR) as machine learning algorithm, over the SPAMBASE collection. We have also used two machine learning algorithms: SVM and PLAUM, in order to compare the results. Our aim is to check the efficiency and effectiveness of the BBR method. The obtained results show good results in terms of precision and reca...
متن کاملImproved method for the determination of triacylglycerols in olive oils by high performance liquid chromatography
El análisis de triacilgliceroles tiene gran importancia como herramienta en el control de calidad y en la determinación del origen de los aceites de oliva. Nuevas mejoras en el análisis de triacilgliceroles en aceite de oliva mediante cromatografía líquida de alta eficacia se han desarrollado para mejorar la separación entre las parejas críticas LLL/OLLn y OLL/OOLn usadas en la detección de ace...
متن کاملConstructing and Evaluating a Novel Crowdsourcing-based Paraphrased Opinion Spam Dataset
Opinion spam, intentionally written by spammers who do not have actual experience with services or products, has recently become a factor that undermines the credibility of information online. In recent years, studies have attempted to detect opinion spam using machine learning algorithms. However, limitations of goldstandard spam datasets still prove to be a major obstacle in opinion spam rese...
متن کاملMétodos para la Selección y el Ajuste de Características en el Problema de la Detección de Spam
Carlos M. Lorenzetti Rocı́o L. Cecchini Ana G. Maguitman András A. Benczúr Laboratorio de Inv. y Des. en Inteligencia Artificial Laboratorio de Inv. y Des. en Computación Cientı́fica Departamento de Cs. e Ing. de la Computación – Universidad Nacional del Sur {cml,rlc,agm}@cs.uns.edu.ar Data Mining and Web search Research Group – Informatics Laboratory Computer and Automation Research Institute – ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Procesamiento del Lenguaje Natural
دوره 58 شماره
صفحات -
تاریخ انتشار 2017